Optimizing Classifiers for Hypothetical Scenarios
نویسندگان
چکیده
The deployment of classification models is an integral component of many modern data mining and machine learning applications. A typical classification model is built with the tacit assumption that the deployment scenario by which it is evaluated is fixed and fully characterized. Yet, in the practical deployment of classification methods, important aspects of the application environment, such as the misclassification costs, may be uncertain during model building. Moreover, a single classification model may be applied in several different deployment scenarios. In this work, we propose a method to optimize a model for uncertain deployment scenarios. We begin by deriving a relationship between two evaluation measures, H measure and cost curves, that may be used to address uncertainty in classifier performance. We show that when uncertainty in classifier performance is modeled as a probabilistic belief that is a function of this underlying relationship, a natural definition of risk emerges for both classifiers and instances. We then leverage this notion of risk to develop a boosting-based algorithm—which we call RiskBoost—that directly mitigates classifier risk, and we demonstrate that it outperforms AdaBoost on a diverse selection of datasets.
منابع مشابه
Lipschitz Classifiers Ensembles: Usage for Classification of Target Events in C-OTDR Monitoring Systems
This paper introduces an original method for guaranteed estimation of the accuracy for an ensemble of Lipschitz classifiers. The solution was obtained as a finite closed set of alternative hypotheses, which contains an object of classification with probability of not less than the specified value. Thus, the classification is represented by a set of hypothetical classes. In this case, the smalle...
متن کاملAddressing Class Imbalance in Grammatical Error Detection with Evaluation Metric Optimization
We address the problem of class imbalance in supervised grammatical error detection (GED) for non-native speaker text, which is the result of the low proportion of erroneous examples compared to a large number of error-free examples. Most learning algorithms maximize accuracy which is not a suitable objective for such imbalanced data. For GED, most systems address this issue by tuning hyperpara...
متن کامل'Imagined guilt' vs 'recollected guilt': implications for fMRI.
Guilt is thought to maintain social harmony by motivating reparation. This study compared two methodologies commonly used to identify the neural correlates of guilt. The first, imagined guilt, requires participants to read hypothetical scenarios and then imagine themselves as the protagonist. The second, recollected guilt, requires participants to reflect on times they personally experienced gu...
متن کاملEfficiently Learning Nonlinear Classifiers for Domain Specific Performance Measures
In practical applications, machine learning algorithms are often needed to learn classifiers that optimize domain specific performance measures. In the past, the research has focused on learning the needed classifier in isolation, yet learning nonlinear classifier for nonlinear and nonsmooth performance measures is still hard. In this paper, rather than learning the needed classifier by optimiz...
متن کاملOptimizing pricing and ordering strategies in a three-level supply chain under return policy
This paper develops an economic production quantity model in a three-echelon supply chain composing of a supplier, a manufacturer and a wholesaler under two scenarios. As the first scenario, we consider a return contract between the outside supplier and the supplier and also between the manufacturer and the wholesaler, but in the second one, the return policy between the manufacturer and the wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015